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Abstract — The problem of channel shortening equalization for 
optimal detection in ISI channels is considered. The problem 
is to choose a linear equalizer and a partial response target 
filter such that the combination produces the best detection 
performance. Instead of using the traditional approach of MMSE 
equalization, we directly seek all equalizer and target pairs that 
yield optimal detection performance in terms of the sequence 
or symbol error rate. This leads to a new notion of a posteriori 
equivalence between the equalized and target channels with a 
simple characterization in terms of their underlying probability 
distributions. Using this characterization we show the surprising 
existence an infinite family of equalizer and target pairs for which 
any maximum a posteriori (MAP) based detector designed for the 
target channel is simultaneously MAP optimal for the equalized 
channel. For channels whose input symbols have equal energy, 
such as g-PSK, the MMSE equalizer designed with a monic target 
constraint yields a solution belonging to this optimal family of 
designs. Although, these designs produce IIR target filters, the 
ideas are extended to design good FIR targets. For an arbitrary 
choice of target and equalizer, we derive an expression for the 
probability of sequence detection error. This expression is used 
to design optimal FIR targets and IIR equalizers and to quantify 
the FIR approximation penalty. 

Index Terms — Intersymbol interference, linear equalization, 
channel shortening, partial response, target design, MAP detec- 
tion, decision feedback. 



I. Introduction 

The problem of designing channel shortening equalizers 
for maximum-likelihood sequence detection in inter-symbol 
interference (ISI) channels has been widely studied [1-5]. The 
function of the equalizer is to modify the channel response to 
reduce the length of the ISI in the system thereby reducing 
the complexity of the sequence detector. Traditionally, the 
equalizer is designed so that the equalized channel response 
approximates a pre-specified short FIR sequence called the 
partial response (PR) target. Two commonly studied classes 
of equalizers are the zero-forcing equalizer (ZFE) and the 
minimum mean-squared error (MMSE) equalizer. The ZFE 
forces the equalized channel response to match the target 
response exactly. The undesired effect of zero forcing is 
that it colors the noise spectrum and may amplify the noise 
significantly. In contrast, the MMSE equalizer minimizes the 
variance of the equalization error, but the error is signal 
dependent. In both cases the goal is to make the equalized 
channel response close to the target response. However, the 
ultimate goal of the channel shortening equalization ought to 
be a detection performance measure such as the sequence or 
symbol error rate. 

The authors are with Seagate Research, Pittsburgh, PA 15222. Email: 
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In this work we take revisit the problem equalizer design in 
the context of optimal (MAP) detection of the input. The main 
contribution of this paper is a new perspective for the problem 
of channel shortening equalization in terms of the underlying 
a posteriori probabilities (APPs) rather than the traditional 
approach of using the MMSE equalization error as the criterion 
[1-6]. We pose the question: In what sense should the target 
channel be equivalent to the equalized channel to achieve best 
detection performance? The answer to this question naturally 
leads us to a new notion of a posteriori equivalence (APE) 
between the equalized channel and the target channel. We 
show that this form of equivalence, which is expressed in terms 
of their underlying a posteriori probabilities, guarantees no 
performance loss due to equalization compared to the optimal 
detector for the original channel. This result thus provides a 
new recipe for equalizer and target design, which is different 
from the heuristic approach of matching the responses of the 
target and the equalized channel. We also prove that there is 
a family of IIR equalizers and targets which guarantee APE. 



This paper is organized as follows. In Sections [TT] and 
fljTI we review the background material on optimal sequence 
detection and linear equalization. In Section [TV] we consider 
the problem of sequence detection for the equalized channel. 
We present our main theoretical results including a posteriori 
equivalence and its algebraic characterization. In Section [V] 
we consider practical implications of our results. In particular, 
we show that the MMSE equalizer designed with a monic 
target constraint yields an optimal solution for ISI channels 
when the input symbols have equal energy. Unfortunately, 
the equivalence conditions usually hold only for IIR targets, 
making the results somewhat useless for channel shortening. 
However, in Section [VT] we extend the results to FIR target 
design where we seek the best FIR target and IIR equalizer 
with a small but acceptable 'FIR approximation penalty." We 
derive an expression for the sequence detection error rate, and 
use this as a performance measure for the filter design. For 
simplicity of the analysis we consider only IIR equalizers. The 
problem of FIR equalizer design would entail the additional 
task of optimizing the processing delay. We refer the reader to 
[6-8] for the problem on optimizing the processing delay for 
systems using MMSE equalization. A similar analysis related 
to FIR equalizer design would be equally important in our 
problem, but is beyond the scope of this paper. Finally, in 
Section IVIII we apply our theory to an example ISI channel 
with binary and non-binary inputs to confirm our predictions 
through computer simulation. 
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A. Definitions and Notation 

Let a denote a discrete-time sequence {a n : n g Z}. If a 
has finite energy its discrete-time Fourier transform is defined 

as 

T{a) = A{uj) = Y,a n e~^ . 

n 

The convolution of two sequences a and b is denoted by c = 
a * 6: 

Cn — ^ ^ O/mbn—m- 
n l 

Let 5 denote the discrete delta function: <5„ = for n ^ 
and <Jo = 1- Define the inner product between two sequences 
a and b as 



(a,b) = y £a* n b n =± £ A*(u)B 



(uj)duj 



where * denotes complex conjugation for scalars or conjugate- 
transposition for matrices. Thus, the norm of a is 



a = (a, a 



1/2 



Given a sequence a, let a be obtained by time-reversal and 
conjugation of a, i.e., 



special case we also shall consider independent and identically 
distributed (IID) inputs with S x (u>) = 1. An example for the 
input symbol set is the Q-phase PSK constellation, 

C = {V2eJ 27Tq/Q :g = 0,...,Q-l} 

in the complex case or the BPSK (bipolar binary) constellation 
C = {— 1, +1} in the real case. 

II. Optimal Sequence Detection 

Suppose that a message x = {x m : m = 0, . . . , M — 1} 
of finite length M symbols is transmitted through the channel 
©. The received signal is given by 



M-l 



I) 



m=0 



Since the additive noise is white Gaussian, we have 

D(y, X y 



P(y|*)ocexp(-^) 



where 



M-l 



D(y,x) = \Vn - h 



n —m^m 



(3) 



(4) 



(5) 



The Fourier transform of a is A*(u>). Thus, we readily obtain 
the following identity: 



(a * 6, c) = (6, a * c) 



(1) 



i.e., the adjoint of the convolution operation with a is convo- 
lution with d. 

Let x and y denote real or complex stationary random 
processes. The cross-correlation function is defined by 

where E(-) denotes expectation and ( is the number of real 
dimensions per sample, i.e., £ = 1 for real processes and 
( = 2 for complex ones. The autocorrelation of x is obtained 
by setting y = x. The power spectral density of x is S x (u>) = 
!F{r xx }. We write x _L y if r xy = 0. 

B. ISI Channel Model 

Consider the following discrete-time model for a real or 
complex-valued linear time invariant system 



y = h * x + w 



(2) 



where x = {x m } is the input to the channel, h = {h m } 
is the channel impulse response and w = {w n } is additive 
white Gaussian noise with S w (u>) = o 2 ^. Assume that h 
has finite energy but is possibly non-causal and infinite. The 
channel model (0 is usually the base-band representation 
after whitened matched filtering [9] and describes a variety 
of communication systems. 

In the case of complex channels, the noise is assumed to 
be circularly symmetric. Thus, the real and imaginary com- 
ponents of the noise samples are independent with variance 
erj. Let the input power spectral density be S x (u>). As a 



with the summation over n carried over the finite region of 
interest where the samples y n are available. 

Given the output sequence y, the maximum a posteriori 
(MAP) estimate of x is given by 



dcf 



argmaxP(a;|y) = a,r g max P(y\x)P(x) 

X X 

D(y,x) 



arg mm 



logP(aj) 



(6) 



where P(x) is the prior probability distribution on x. If 
this distribution is uniform, then (O reduces to maximum- 
likelihood (ML) detection of the input sequence: 



x = argmin D(y,x) = \\y — h * x\ 



(7) 



Unfortunately, the direct use of the above expression is 
limited due to its computational complexity which grows 
exponentially with the length of the ISI. However, when h n 
is a short FIR sequence, the above cost function can be 
minimized exactly and computationally efficiently using the 
Viterbi algorithm which was originally devised to decode 
convolutional codes [9-11]. 

III. Review of Linear Equalization 

In order to implement the Viterbi algorithm to solve the 
ML sequence detection (0 with manageable complexity, we 
need to reduce the length of the ISI in the system. This is 
usually accomplished by using a linear equalizer to condition 
the channel response to match a pre-specified target response. 
When the target is a short FIR filter, it is called a partial 
response (PR) target. The Viterbi detector operates on the 
equalized channel to perform sequence detection pretending 
that the samples were the output of a hypothetical target 
channel. 
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Let / = {/„} and g = {g n } denote the equalizer and target 
filters respectively. For the moment assume that the target is 
fixed. Fig. Q] illustrates the system with an equalizer whose 
output is 



f *y = f*h*x + f*w 
I * x + u 



(8) 



where I = f * h is the the response of the equalized channel 
and u = f * w is the output noise whose power spectral 
density is S u (lo) = \F(lo)\ 2 S w (lo) = o 2 w \F(u)\ 2 . 

Definition 1: The target channel is a hypothetical channel 
defined by 



g*x 



(9) 



where x is the input, v is additive white Gaussian noise with 
S v (uS) — (j 2 , and z is the output. 

The original channel with the equalizer is illustrated in 
Fig. Q] and the target channel that approximates it is shown 
in Fig. H] Traditionally, the equalizer and target are designed 
to make the equalized channel response I close to target g, 
while keeping the noise white. 
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Fig. 1. The equalized channel 
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Fig. 2. The target channel 



A. Zero Forcing Equalizer (ZFE) 



The ZFE modifies the channel response to match the target 
filter exactly, i.e., I = g. Thus, in the frequency domain, the 
equalizer is given by 



F(u>) 



G{lo) 



The spectral density of the noise u is 

|GH| 2 



S u (u) = \F(lu)\ 2 S w (u) 



\H(u) 



(10) 



(11) 



An undesirable problem with zero-forcing equalization is that 
when the channel response \H has a spectral null or attains 
very small values, the equalized noise is highly colored and 
has large variance. The ZFE is rarely used for this reason. 



B. Minimum Mean Squared Error (MMSE) Equalizer 

A widely used equalizer in practical systems is the MMSE 
equalizer which is designed to minimize the variance of the 
equalization error e defined as 



def 

e = g * x 



f*V- (12) 
The MMSE equalizer ensures that el y, which yields 

S x (u)H*(w)G(w) 



F(w) 



\H{u)\ 2 S x {io) + al 



(13) 



where S x (uj) is the power spectral densities of the input x. 
The spectral density of the estimation error is given by 



S e {uj) = 



\G(u)\ 2 S x (u;)a 2 w 
H(cu)\ 2 S x (oj)+al 



(14) 



The advantage of the MMSE design over the ZFE is that the 
spectrum of the MMSE noise (TT~4b is less colored and always 
smaller than the ZFE noise (TTTI) and spectral nulls in H(cu) 
cause no problems. However, e is signal dependent, which 
may cause the Viterbi detection to be suboptimal. 

C. Target Design 

Instead of choosing a fixed target, we seek the best target 
of a fixed length. In practice, the target is usually designed for 
an MMSE equalizer. Thus, we minimize the variance of the 
MMSE equalization error ( Q : 



(15) 



i r 

min — / S e (u)duj 

where the target g is assumed to have length L: 
9 = {9o,9i,---:9l-i}- 



The resulting cost function is a simple quadratic function of 
the target filter taps. Clearly, with no further constraints on g 
we obtain the trivial solution g = 0. Therefore, an additional 
constraint is imposed on g such as the unit-energy constraint 



E 



9n 



1 



or the monic constraint 



.90 = 1- 



(16) 



(17) 



or sometimes the unit-tap constraint gu = 1 for some k. 
In each of these cases, the optimal target, known as the 
generalized partial response (GPR) target, is found easily by 
solving ( fT~5b subject to the appropriate constraints. 

For illustrative purposes, we derive the solution to the monic 
design in the IIR limit (L — ► oo), where the problem can be 
expressed in the frequency domain as 



mm = 
a 



1 

2^ 



\G(m)\ 2 S x (Lo)a 2 w 
\H(uj)\ 2 S x (oj)+a; 



-dhJ 



(18) 



over all causal targets g with go = 1. 

The causal and monic constraint on g is cumbersome to 
express directly in the frequency domain. However, we know 
that among all the causal and stable spectral factors of Q(uS) = 



4 
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|G(o>)| 2 the value of go is maximized for the minimum-phase 
factor [12]. This maximum value is given by 



i r 

log So = — J logQ(uj)du. 



Therefore, we rewrite the optimization ( TT~8T > in terms of Q(ui) 

Q(u)S x (u)al 



1 



mm ■ 



9 2nJ_ % \H(Lj)\*S x (Lj)+ai 



-duj 



such that 



±- / logQH^ = 0. (19) 



The Lagrangian 

c( q ,x)= r 

J —IT 



Q(u>)S x (u>)al,duj 
H(uj)\ 2 S x (uj) + <j 2 



- X 



log Q{uj)duj 



is stationary at the solution. Using calculus of variations, we 
obtain 

\H{u)\*S x {u)+ai 



\G(uj)\ 2 = Q{uj) = X- 



H(u>) 



X 



(20) 



°w S x (u>) 
where the Lagrange multiplier A is chosen to satisfy ( fT~9b 

The optimal G(co) is the causal minimum-phase spectral factor 
of Q(u>), and the MMSE equalizer ( fT3] l reduces to 



F(u) 



X H*{lo) 



(21) 



< G*{lo) 

The spectrum of the estimation error ( fT~4T > is white for this 



solution: 



SeM = A. 



(22) 



Henceforth, we refer to this solution as the monic design or 
monic solution implicitly associating the optimal target with 
the MMSE equalizer. For the special case of zero-mean IID 
inputs with S x (u>) = 1, the above solution reduces to 

H*(u)G(u) 



F(co) 



|HH| 2 + a3 



A 



(23) 
(24) 



Coincidentally, this solution is related to the linear MMSE 
decision feedback equalizer (DFE) for the given ISI channel 
[13-17]. The MMSE-DFE structure is optimal in achieving 
the capacity for an ISI channel with additive white Gaussian 
noise [16-19]. However, it is not immediately obvious or even 
always true that the above equalizer and target filters would 
be optimal for sequence detection of (non-Gaussian) input 
symbols. 

As a caveat we reiterate that the sequence detection is not 
meant to be implemented with decision feedback. We still 
use the Viterbi algorithm or a MAP based algorithm such 
as the forward-backward algorithm to compute the symbol a 



posteriori probabilities (APPs). It has been observed that the 
monic design performs better in detection than other design 
criteria such as the energy constraint ( fToT l or the unit-tap 
constraint on the target. In the following section, we shall 
formally prove this conjecture. 

In practice, we need to design FIR equalizers and targets 
with unknown channel and noise characteristics. In this case 
the second order statistics of the channel input and output 
are estimated using training and subsequently used to design 
FIR filters. The solutions to these problems for the various 
target constraints is described in [1,6,20]. We point out that 
this method is also applicable if the noise is colored because 
the design ensures that the noise whitening is automatically 
absorbed into the equalizer /. 

IV. Sequence Detection for the Equalized 
Channel 

Traditionally, the sequence detection is performed in two 
steps. The first step is to equalize the channel output. The 
next step is to perform the detection pretending that the 
equalizer output z (Fig. [TJ were the output of the hypothetical 
target channel (Fig. |2). In other words, although the sequence 
detector is optimally designed for the target channel it is, in 
reality, applied to the equalized channel. In this section we 
consider the performance of such a detector. For simplicity of 
analysis we assume that the target and equalizer are IIR and 
the target is causal. We consider the design of FIR targets in 
Section ED 

Consider the system described by ([8), restated below: 

z = l*x + f*w. 

By design, the above channel approximates the target channel 
©. The conditional probability of the output of the target 
channel is 



P(z\x) oc exp 



2a?, 



where 



M-l 



D(z,x) = J2\Zn- Yl 



9n- 



7U—0 



(25) 



(26) 



with the summation over n carried over a finite region of 
interest where the samples of z are available. The following 
result provides an alternate expression for D(z, x) which will 
be useful in proving a form of equivalence between the target 
channel and the equalized channel. 

Lemma 1: Suppose the equalizer / and target g are chosen 
such that g-k f = ah for some a > 0, then 

D(z,x) - \\z\\ 2 = (x,s*x)+a(D(y,x) - \\y\\) 2 

where s = g * g — ah * h. 

Proof: We begin by expanding D(z, x) as follows 

D(z,x) = \\z — g*x\\ 2 

= \\z\\ 2 — 25R(g *x,z) + (g *x,g * x) 
= \\ z \\ 2 - • f *y) + (x,g*g*x) 
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where denotes the real part. The last step follows by 
applying (fl]i and using z = f *y. Using the hypothesis that 
g-k f = ah we obtain 



D(z,x) 



2a^R(h-ky, x) + (x,g*g*x). (27) 



Meanwhile, a similar argument shows that 

D(y,x) = \\y-h*x\\ 2 

= \\y\\ 2 - 2ft(h*y,x) + (h*h*x,x). (28) 
From d27"l i and ( f28l . we obtain the desired result 



D{z,x) - \\z\\ 2 = (x^s + x) + a(D(y,x) 
where s = g * g — ah * h. 



y 



A. Equivalence of Equalized Channel and Target Channel 

We now interpret Lemma Q] in terms of the underlying 
probability distributions. Let upper-case letters denote random 
variables and lower-case letters denote realizations of these 
random variables. Suppose that F(u>) is a stable filter, i.e., 
it has no spectral nulls or singularities. Then, z = f * y is 
invertible. Hence, for the equalized channel 

P(x\z) = P{x\y) cx P(x)P{y\x) 
D{y,x) 



cx P{x) exp 



2cr 2 



where the constants of proportionality above (and henceforth) 
are always independent of x. Using Lemma Q] and noting that 
y and z are constants, we obtain 

^, , s T-,, % / D(z.x) (x,s-kx)\ 
P{x\z) ex P(x) exp ( - „_!,, + „__, )■ (29) 



2aal 



2aa 2 



Suppose that the hypothetical target channel is assigned an 
input prior distribution P(x) which is possibly different from 
P(x). The a posteriori probability of x is 

P(x\z) x P(x)P(z\x) x P(x) exp ( - D ( z >f> ) . (30) 

V 2ai ) 

Comparing ( f29T > and (l30l l, we see that by setting the noise 
variance a 2 and the input prior distribution P(x) of the target 
channel (0 to 



2 def 2 

a„ = aa„, 



P{x) cx P(x) exp 



2a?, 



(31) 
(32) 



we ensure that the a posteriori PDFs for the equalized and 
target channels are equal: 

P(X = x\Z = z) = P T (X = x\Z = z) 

with the understanding that the left-hand side is the APP 
corresponding to the equalized ISI channel (0 with a prior 
P{x) on x, while the right-hand side is the APP corresponding 
to the target channel (0 with input PDF P(x). 

Remark 1: We reiterate that the target channel is a hy- 
pothetical channel and we are free to treat its parameters g 
and a 2 as well as its input PDF P(x) as design parameters. 
We assume neither that / is the MMSE equalizer designed 



for the target g nor that a 2 is the variance of equalization 
error. Although this approach is radically different from the 
traditional approach in the literature on channel shortening 
equalization [1, 3, 6], it is essential to derive the correct form of 
equivalence between the target and equalized channels defined 
below. 

Definition 2: The equalized channel is equivalent to the 
target channel in the a posteriori sense if they produce the 
same a posteriori probability for the input given the output. 
This form of equivalence is called a posteriori equivalence 
(APE). 

Evidently, this definition of equivalence is the most natural 
one from the perspective of MAP detection. As a caveat, we 
point out that Pt(Z = z) and P(Z = z) need not be equal, 
i.e., the equalizer output z would not be a typical output of the 
target channel. The above observation may be stated succinctly 
as follows: 

Theorem 1: The equalized channel ((HJ) with the prior dis- 
tribution P(x) and the target channel (0 with the prior 
distribution P(x) are a posteriori equivalent. 

In general, the MMSE or ZFE equalizers do not guarantee 
this form of equivalence even though they attempt to make the 
equalized channel response close to the target response. 

Corollary 1: Suppose that the target and equalizer are cho- 
sen to be the monic solution ( f20b and ( 1211 ). Furthermore, let 



A and let 



P(x) oc P{x) exp 



2A 



be the input prior distribution for the target channel (0 where 
S(oj) = X/S x (lu). Then, the equalized channel is equivalent 
to the target channel in the a posteriori sense. 

Proof: Observe that the monic target d20b and equalizer 
(|2TT > satisfy the hypotheses in Lemma [T] if we set a = A/cr^, 
and S(uj) = X/S x {lo). Therefore, by CD), v 2 = aa 2 w = A. 
The claimed result follows from Theorem Q] ■ 

The above result shows that we can use the monic design 
for optimal MAP detection provided that we use the prior 
distribution P(x) for the target channel. In many cases, the 
input is IID with a flat spectrum (S x (u>) = 1) implying that 
P(x) = P(x), i.e., we do not need a different prior PDF for 
the target channel. 

Remark 2: If we pretend that the equalizer output z came 
from the output of the target channel with a carefully chosen 
input prior distribution, then all MAP-based detection algo- 
rithms designed for the target channel work optimally when 
applied to the equalized channel. These algorithms include 
hard-decision decoding such as the Viterbi algorithm, and 
soft-decision decoding such as soft-output Viterbi algorithm 
(SOVA) and the BCJR algorithm. Soft-decision algorithms, 
unlike the Viterbi algorithm, use an extra parameter, viz. the 
variance of the additive noise in the channel. When applying 
soft decoding to the target channel, we must use a 2 as this 
variance parameter. Our calculation above show that a 2 simply 
equals A, the equalization error variance (see (l22l ). This fact 
is routinely assumed in many system designs with no rigorous 
justification but it is fortunately the correct value to use. 
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V. Practical Considerations 

We now consider some practical implications of our main 
result in Section [IV] Henceforth, we assume that P{x) is a 
uniform distribution over the set of allowed code sequences. 
In this case, the MAP sequence estimate © coincides with 
the ML estimate ([7J|. 

Theorem 2: Suppose that all the input sequences in the 
message codebook have equal energy and that the equalizer / 
and target g are chosen such that 



G*(lu)F(lu) = aH*(Lu) 
\G(u)\ 2 =a(\H(")\ 2 



0) 



(33) 
(34) 



for some a > and (3 £ R that produces a valid G(ui), then 
we can set P(x) = P(x). Furthermore, if P{x) is uniform, 
the optimal estimate of the input is 



x = argminZ)(y ) x) = argmiii D(z,x) 



(35) 



Proof: In the time domain, the hypotheses imply that 
s = 9*9 a h * h = a(38 and g * f = ah. Therefore, 
P(x) = P(x). The proof now readily follows by applying 
Theorem Q] ■ 

Theorem |2] is applicable, for example, if the input symbols 
are elements of the Q-phase PSK constellation, i.e., x n E C = 
{\[2e^ 2lxq ^ : q = 0, . . . , Q — 1} in the complex case or the 
BPSK constellation C = {— 1, +1} in the real case, since all 
message sequences have equal energy. 

Clearly, for this special family of equalizer and target filters 
there is no performance loss in sequence detection if we 
minimize the surrogate cost function D(z,x) instead of the 
original cost D(y, x). The practical implications of this result 
are that in the IIR limit we can achieve optimal sequence 
detection using any solution from the family (see also [21]). 
In general, these targets are as long as the channel itself. 
However, we require a short FIR target for a Viterbi-based 
implementation. We address this problem in Section [VTl where 
we show how to design good FIR targets to minimize the 
detection error rates. 

Note that the parameter a is merely a scaling factor (the 
target and equalizer scale as s/a) but f3 affects the shape 
of the filters. Thus, we have a degree of freedom in design 
represented by /3. We also have the freedom to choose the 
phase response of G(ui). However, the most logical choice 
would be to choose G{lo) as the causal minimum-phase 
spectral factor of (1341 1. We now consider several interesting 
cases in the family of optimal solutions: 

1) The case a = 1 and (3 = produces 

\G{w)\ 2 = \H(u)\ 2 



and 



F(u) 



H*{u) G(w) 



G*{uj) H(u) 

which is an all-pass zero-forcing equalizer filter which 
keeps the noise white. 
2) Setting a = A/er, 2 and (3 = a 2 w yields the monic solution 
(see (|23l and ( f24b ) for S x (u>) = 1, proving its con- 
jectured optimality in the asymptotic (IIR) case. When 



^ a w' tne solution corresponds to an monic design for 
a different noise level. However, this mismatch causes 
no performance loss in sequence detection. Curiously, 
some negative values {3 € (— inf w \H(ui)\ 2 , 0) also yield 
optimal solutions even though they do not represent the 
variance of any meaningful noise. 
Remark 3: The above argument shows that the monic de- 
sign is an optimal choice if the input spectrum is white. 
However, suppose that channel input spectrum is colored, 
perhaps by the use of spectral shaping codes. Then, the monic 
design ( ETT i has the required form in Theorem [2] However, 
the target (|20T > does not because it depends on S x (uj). Hence, 
the monic design may be suboptimal for colored inputs. In 
fact, for optimality we must perform the monic design for the 
target and equalizer with an IID input regardless of whether 
the actual input is white or colored. This is particularly true 
at low SNRs where the a w is large and the second term in 
(l20t dominates. At high SNR values, the effect of the input 
spectral color on training diminishes. 

A. Matched Filter Equalization 

We briefly examine the special case of the solutions in 
Theorem [2] when we let f3 — > oo. This corresponds to the 
monic solution for a very low SNR, i.e., er^ — > oo. For 
convenience, we let a = f3 without loss of generality. Then, 
and (l34l imply that 



\G(u)\ 2 = f3 2 (l + \H(cj)\ 2 r 1 ) 



(36) 



and 



F(u) = (3H{u)* /G*(uj). 

For j3 ^S> 1, we use (l36l l to express G{uj) as 

G(u) = f3 + A(oj) + Oip- 1 ) 

where A{oj) must be causal if G(uj) is minimum-phase. Thus, 
as (3 — * oo we have F(ui) approaches the matched filter 
H*(lu). Now, observe that 

\G{u)\ 2 = (3 2 [l + (AM + A*(lu))(3^ 1 + 0(/T 2 ) 

Comparing this with (l36l l, we obtain 

A(u) + A*(lu) = \H(u)\ 2 + Otp- 1 ). 

Therefore, in the time-domain 

if n > 
if n = 
,0, if n < 

where r h = h*h is the auto-correlation function of h. Using 
g = (38 + a + 0{(3~ r ) it is readily verified that 

2 

32||™||2 



D(z, x) = \\z — g * x\ 



|~ - 2$(g-kx,z 



x,g*g*x) 

(3 2 \\x\\ 2 - 2(3{$l(x,z) - (x,a*x)) + 0(l) 



Since ||a;j| 2 is constant for all inputs sequences and (3 — > oo, 
we deduce that the ML estimation rule becomes 



argmin£>(2;, x) = argmax3?(cc, z — a-kx). 



(37) 
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We interpret the above calculations as follows. The equalizer 
is a matched filter: / = h and the term z — a * x repre- 
sents the equalizer output with the post-cursor ISI removed 
using decision feedback. The estimator simply maximizes the 
correlation between this sequence with the input. 

It is easy to verify that (x,a*x) = ^\\h-kx\\ 2 . Thus, the 
matched filter equalization structure may be derived alterna- 
tively directly from Lemma [TJ by letting g = S and f = h. 
This approach gives us the following rule for ML estimation 

x = argmax3?(a;, z) — —\\h * x\\ 2 
which is equivalent to d3Tb . 



VI. Optimal FIR Target Design 

In the previous sections we showed the existence a family 
of equalizers and targets that achieve the optimal sequence 
detection performance if we pretend that the equalizer output 
came from the target channel. Unfortunately, the optimal 
target, being the minimum phase spectral factor of (l24l) . has 
the same length as the original channel (except in rare cases 
where it can be shorter). As such, we have not reduced the 
detector complexity by equalization. 

In this section, we consider the more practical problem 
of the design of FIR targets to achieve the best detection 
performance. We consider only real channels with BPSK input 
symbols (C = {— 1,+1}). With some effort, these result can 
be generalized to complex channels or non-binary inputs as 
well. 

Suppose that x° is the actual input to the channel, and x 
is the ML sequence estimate. Then e = (x — x°) is an error 
sequence. We say that two error sequences belong to the same 
equivalence class if they are related to each other by a time- 
shift or phase-rotation (or sign-change). Of all error sequences, 
a dominant error sequence is one that which minimizes ||e|| 2 
where e = h*e is the noise-free channel response to the input 
e. We call e a dominant output error sequence. 

Clearly, dominant error sequences are not unique because 
all sequences in the equivalence class of a dominant error 
sequences are also dominant. However, we shall assume that 
there is a unique dominant equivalence class whose represen- 
tative element e has the canonical form: cq ^ and e n = 
for n < 0. Indeed, some channels could have a multiplicity 
of dominant events that belong to the different equivalence 
classes. In that case our probability of error estimate would 
be scaled by the multiplicity factor. 

Let Qg(-) be the Gaussian Q-function 



signal-to-noise ratio of the system 

|5i(e,p*h,*e)| 2 



1 



We now estimate the probability of sequence detection error 
for any choice of target and equalizer in terms of the Q- 
function. 

Theorem 3: At high SNR, the probability of sequence de- 
tection error for a real BPSK channel is given by P| cq ~ 
kQ g (VSNR) for some constant k with SNR is the effective 



SNR 



a 2 w \\p-ke\\ 2 



i! (<Z — P * h) * e — v\\ 2 

where p — f*g, q = g*g, and v is any sequence with the 
same temporal support as the dominant error sequence e. 

Theorem [3] is proved in Appendix U using error analysis 
similar to that of standard Viterbi detection [9,22]. Note that 
the bit error rate (BER) also takes the same form as P| cq but 
has a different constant than k. The above result is applicable 
for FIR and IIR equalizers and targets. The optimal equalizer 
/ and target g are chosen to maximize SNR subject to relevant 
constraints. 

For practical reasons, we seek FIR targets, since the detector 
implementation complexity is exponential in the target length. 
The constraint on the equalizer length is less important since 
the complexity growth is only linear. For simplicity we assume 
that the equalizer is IIR but the target is FIR with length L. In 
this case, it is more convenient to maximize SNR over p and 
q because / and g can be recovered uniquely from p and q 
by spectral factorization. Note that p is IIR but q, being the 
autocorrelation function of g, is FIR. Furthermore, we have 
Q(u>) > 0. We write SNR — max„ SNR(p, q, v) where 

|5i(e,p*h*e)| 2 



5NR(p,q,v) d = 



Now observe that 



\(q-p*h)*e -v\\ 2 + cr 2 \\p* e\ 



SNR(p, q, v) = SNR(p, q + /3S, v - f3e) 

for any (p, q, v) and (3 G R. Moreover, if v has the same 
temporal support as e, then so does v' = v — f3e. Since we 
are minimizing SNR(p, q, v) over all v, we conclude that the 
quantity 



maxSNR(p, q, v) 



(38) 



would remain unchanged if we replace q by q + (36. This 
enables us to temporarily replace constraint Q(uS) > by 
<7o = for the sake of the maximization. Having rid of the 
constraint on Q(to), the maximization is readily transformed 
into a quadratic minimization. As a final step, we add a 
sufficiently large (3 to the solution Q(co) to make it satisfy 
Q(u) > 0. 

The analytical solution to d38l is presented in the Appendix 
ITT1 We also show there that the noise variance in the hypothet- 
ical target channel noise variance (f3TT > is set to a 2 — A, the 
Lagrange multiplier used in the optimization. 

Clearly, the above maximization admits infinitely many 
solutions parameterized by (3. As the target length approaches 
infinity, these solutions converge precisely to the family of 
solutions in Theorem |2] In this limit the equalizer and target 
filters of Theorem[TJmaximize the effective SNR. Furthermore, 
this maximum value is 

SNR max = l^L!. (39) 

In practice we are interested in FIR equalizers for ease of 
implementation. We point out that we could still maximize 
the effective SNR, albeit numerically, over all FIR targets and 



X 
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Fig. 3. FIR approximation loss vs. target length 

equalizers with length constraints. If we choose to use FIR 
equalizers, we would have the additional task of optimizing 
the processing delay which is an important design parameter 
[6-8]. 

VII. Examples 

We now illustrate our results of the preceeding sections with 
an example. Consider the real ISI channel (ff) with impulse 
response 




e -n/2 ) o < n < g 

otherwise. 



with IID binary input symbols (x n £ C = { — 1, +1}) and SNR 
defined as ||/i|| 2 /°iu where of„ is the noise variance. 

We first study the effect of the target length on the effective 
SNR of the system. The optimal equalizers and targets are 
computed for target lengths of 2 and longer and the resulting 
values of SNR are calculated. Indeed, in the IIR limit for the 
target length we obtain the maximum value S N R max given by 
( |39| >. Fig. [3] shows the FIR approximation loss, (SNR max — 
SNR), for various finite target lengths at an SNR of lOdB. In 
this example the optimal length-3 target incurs about 0.075dB 
penalty in performance and the performance loss for longer 
targets diminishes quickly. 

Next, we evaluate the BER performance of the reduced 
complexity detectors. At each SNR we design the optimal 
length-3 target and IIR equalizer truncated to 21 -taps (cen- 
tered at the origin). The equalizer is sufficiently long since 
it captures most of the energy in the equalizer taps. The 
dominant error event for this channel is e = {1,-1}. We 
also design length-21 MMSE equalizers (centered at 0) and 
length-3 targets described in Section [Til] for the monic target 
constraint. 

Using computer simulations we compare the two designs 
in terms of their BER performance for IID binary inputs. The 
two systems use the Viterbi algorithm to perform the sequence 
detection. The results are shown in Fig. [4] along with the BER 
of the full complexity Viterbi detector (with 2 8 -states) that 



uses no channel shortening equalization. It is clear that both 
the reduced complexity detectors performanc identically with 
a small penalty relative to the full complexity detector. The 
optimality of the monic design is predicted by Theorem [2] for 
the case of IIR filters. Indeed, we observe numerically that the 
monic design is nearly optimal for FIR filters as well. 
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Fig. 4. Comparison of BER performance of two designs for binary signaling 

Next, we consider the same ISI chanel with an IID ternary 
input (x n e C = {-^/3/2,0,+^/3/2}) which has unit 
average symbol energy. This input symbols themselves have 
unequal energy. Recall the results for the IIR case in Section 
[lV]that the optimal sequence detector for the equalized channel 
needs to pretend that it sees the output of the target channel 
with the input prior distribution is given by (l32l : 

~ , , / (x, s * x)\ 

^ v 

where s = g * g — ah * h. Thus, the optimal detector needs 
to minimize the cost function 

min (|| z — g -kx\\ 2 — (x, s * x)) 

X 

where the second term is correction term that originates from 
the input prior distribution P(x). For the choice of equalizer 
and target in Theorem [2] we have 

def 

s = 9*9 a h *h = a{38. 

Therefore, (x, s*x) = a(3\\x\\ 2 , which depends on the energy 
of the sequence. The correction term is an issue only for signal 
constellations unequal symbol energies. For the monic target 
and MMSE equalizer design, we have aj3 equals the variance 
of the equalization error, A. Thus, the cost function reduces to 

min (||2-g*:E|| 2 -A||a;|| 2 ). 

X 

We directly adapt this expression to the FIR case as well by 
subtracting A|x„| 2 from the trellis branch metric at time n. In 
fact, the detector would be suboptimal without the correction 
term, as we confirm below. 



VERSION FEBRUARY 2, 2008 



9 



We design a length-3 monic GPR target and a length-21 
MMSE equalizer for this channel and calculate the symbol 
error rates (SER) numerically using the Viterbi algorithm. 
Fig. [5] shows the SER obtained with and without the correction 
term in the trellis branch metric. The figure also shows the SER 
for the full complexity Viterbi detector (with 3 8 states) that 
uses no channel shortening equalization. There is a small but 
noticeable gain in detection performance with the correction 
term. It must be noted that this modification does not require 
much more detector complexity. As A becomes smaller (at 
higher SNRs) the correction term to becomes smaller also 
and indeed, the performances gain due to the correction term 
diminishes at high SNRs. 
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Fig. 5. Comparison of SER performance for ternary input signaling 



VIII. SUMMARY 

Although a large body of literature exists for the design of 
optimal FIR targets and equalizers, the implicit assumption 
in virtually all existing work on this subject is that MMSE 
equalization is optimal. The purpose of this work was to 
question that assumption. The main contribution of this work 
is a new perspective for the problem of channel shortening 
equalization in terms of the underlying a posteriori proba- 
bilities unlike the traditional approach of using the MSME 
equalization error as the criterion. We introduced the idea of a 
posteriori equivalence (APE) between the equalized and target 
channels. Under this form of equivalence, any MAP-based 
decoding algorithm designed for the target channel would also 
work optimally when applied to the equalized channel. In other 
words, as far as MAP decoding is concerned we can pretend 
that the equalized channel is the target channel. 

In our analysis of the problem we treat /, g, a 2 (noise 
variance in the target channel) and in some cases even the 
input PDF P(x) for the hypothetical target channel as design 
parameters. The equivalence is expressed as a set of algebraic 
conditions on the design parameters. The APE conditions 
admit an infinite family solutions or designs for the equalizer 
and target. In the special case that the input is IID and all 



the code sequences have equal energy, we showed that the 
"monic solution," i.e., the MMSE equalizer designed for a 
monic constrained target, is shown to belong to this optimal 
family of designs. We also observed that the monic solution 
must be designed for spectrally white inputs even if the actual 
input is colored. The family of designs produces IIR filters in 
general, making their practical use somewhat limited, where 
as for a low complexity implementation of optimal sequence 
detection (using Viterbi or BCJR-like algorithms) we require 
short FIR targets. 

We also derived an expression for the probability of se- 
quence detection error assuming IID inputs for arbitrary FIR 
or IIR targets and equalizers. Using this as a performance 
measure, we propose a design algorithm to find the optimal IIR 
equalizer and FIR target. Indeed, in the IIR limit for the target 
these solution coincide with the previously derived optimal IIR 
family of designs that satisfy APE. 

These results are applied to an example ISI channel. Nu- 
merically, we observe that for IID inputs, we obtain nearly 
optimal performance using the monic design, for input signal 
constellations with unequal symbol energies we also need to 
treat the input PDF P(x) for the target channel as a design 
parameter. The optimal detector is designed for the target 
channel with the prior P(x) incorporated into the Viterbi 
branch metric as a correction term, which would normally 
have been ignored if we simply use the monic design. This is 
illustrated for the IID ternary signaling example (Fig. [5j where 
we see a small but noticeable gain by using the correction term. 

Appendix I 
Proof of TheoremO 

Suppose that x° E X is the transmitted sequence, where X 
is the set of sequences that are equally likely to be transmitted. 
The channel and equalizer outputs are y = h * x + w and 
z = f * y respectively. All sequences in the codebook have 
equal energy because the input symbols are IID and binary. 
Thus, the target channel input is also treated as being IID: 
P{x) = P(x). 

The Viterbi detector for the equalized channel computes the 
sequence x that minimizes D(z,x). Thus, the probability of 
sequence detection error is 

pscq = p{z)( ZfX ) < D( z ,x°) for some x ^ x°) 

P{D(z,x) < D(z,x°)} (40) 

where the second step follows from the union bound. Using 
the property that 

\\a\\ 2 - \\b\\ 2 = 3?(a - 6,a + b) (41) 

for any a and b, where denotes the real part, we obtain 

D(z,x) - D(z,x°) = \\z -g*x\\ 2 - \\z - g * x° \\ 2 

= -m(g*x-,z- g*x+) (42) 

where 

± def X i X 
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Applying ([]]) to ( l42l and writing z = f -ky where y 

h -k {x + — x~) + w we obtain 

D(z,x) - D(z,x°) = m{x-,g*f*h*x~) 

+ 4SR(o; _ , g*(g- f*h)* x + ) 
— 4$l(x~ ,g * f *w) 
ee 4(0(a:~) + A(x~,x + ) - ip{x~)) 

where 

, , dcf 



(a; ,p*h-kx ) 



A(x 1 x + ) d =^R(x , (q — p ★ h) * a; H 



ip(x ) = f %i(x ,p*w) 

def .. ~ 

p = g*f 

dcf .. 

q = g*g- 



(43) 
(44) 
(45) 
(46) 
(47) 



Note that ipix ) ~ A^(0, er^||p * a; j| 2 ) is normally dis- 
tributed. Therefore, 

Tl{x-,x + ) d = P{D{z,x) < D(z,x°)} 

= P{ip(x~) - A(x-,x+) > <p{x-)}. (48) 

Thus, d40b can be rewritten as 

p e eq <r^E E n (" + ) 

where X + (x~) is the set of sequences x + such that x + + 
x~ and x + — x~ are valid sequences in X. Note that x + is 
uniformly distributed in X + (x~) when conditioned on x~ . In 
the high SNR regime, it is a good approximation to assume 
that dominant error sequences are the only source of detection 
errors. This allows us to fix x~ = e for any error sequence 
e e £ in the equivalence class £ of dominant error sequences. 
This yields 

pseq <Hy n( +) 

6 - \x ^ v 



|g||* + (e)l 
1*1 



En(e,a;+) 



with the expectation taken over a; + given that a;~ = e. For 
analytical tractability, we assume that A(e,x + ) is approxi- 
mately normally distributed. Thus, d48b yields 



kQ 



a(e) 



where 



<j 2 (e) =v&r(A(e,x + )) + <T 2 w \\p*x-\\ 2 
l* + (e)| 



\£\ 



\x\ 



(49) 

(50) 
(51) 



The constant n is evidently the product of the number of 
allowable dominant error sequences \£\ and the probability, 
|A' + (e)|/|A'|, that x° will allow that error sequence. The bit 
error rate (BER) is approximated by 



jbit 



w H (e) 
M 



psoq 



(52) 



where M is the length of the input codewords. The above 
calculations are similar to probability of error analysis for 
classical Viterbi detection [9]. 

The only remaining step is to estimate the variance of 
A(e,x + ). First note that 

A(e,x + ) = 5R(e, (q-p*h)*x + ) 
= ft(a,x+) 

where a = (q — p ★ h) * e. Now, A(e,x + ) is zero-mean 
because x + is zero-mean. Hence, the conditional variance of 
A(e, x + ) is 



var(A(e, x + )) 



1 



l* + (e)| 



E ( A ( e ^ + )) 5 



x+GX+(e) 



E 



{a, x 



+ \|2 



x+GX+(e) 



Since the input is binary with symbols being ±1, X + (x ) 
contains all sequences x + that satisfy 



e„^0 

It is an easy exercise to check that 

var 



0. 



(A(e,x+))= I a «! 2 

{n:e„=0} 

which may also be written as 

var(A(e, x + )) = min II a — t>|| 2 

= min 1 1 (q — p*h) * e — v\ 



(53) 



where v is a vector whose temporal support is the same as 
that of e. Combining d43l , ( |49l , d50t , and ([33), we obtain 
Pf<3 ~ kQ 3 (\/SNR) where 



SNR 



(e,p-k h-k e) 



min„ || (g — p*h) *e — v\\ 2 + a^Wp k e\\ 2 

\^{e,pkhke)\ 2 

= max r 

" ||(g-p*/i)*e-t;|| 2 +a2||p^ e ||2 

is the effective SNR of the system. 

Appendix II 
Analytical Solution to (l3~8l 



The maximization ( l38l l may be rewritten as 

min || (q - * e - t>|| 2 + ct 2 ||p*e|| 



subject to go = and 



(e,p -k h -k e) = 1 



(54) 



thereby removing the scaling invariance of the solutions. 
Define S = {I : e/ ^ 0} = {s\, . . . , sj}. Then 



V(co) = J2v ie - jlu 

L 

Q(uj) = 2 qi cos{lu>). 



i=i 
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where vi, I € S and q\ : I = 1, . . . , L are the FIR parameters. 
Therefore, 

Q{uj)E{uj) - V{uj) = B(uj)x 



where 



B(w)=(Bi(w) B 2 (uj)) 
Bi(w) = 2£:(lj) (cos(w) cos(2u;) 



Ba(w) 
Finally, let 



. . . p _ J s 



= [Qi, ■■■ , QL, v Sl , 



cos 



VsjY 



(Lw)) 



R(u) d = P(u))H(w)E(u). 



(55) 



In terms of the above quantities, we can rewrite the optimiza- 
tion as 

min^- J \B(u)x- R{uj)\ 2 dw + ct 2 w J \R{uj) / H {uj)\ 2 du 
subject to 



2n" 



9? / R*(uj)E(uj)duj = 1. 



(56) 



All integrals are taken over [— tt, tt}. The cost function reduces 
to 



2tt 



A(oj)\R(uj)\ 2 - 2R(u)*B(w)x}duj + x*Cx 



where A(lu) = 1 + a 2 w /\H(u)\ 2 and 

C=^- [ B*(iu)B(uj)duj. 

Using variational calculus we obtain 

A(u)R(u) - B(uj)x = \E{lj) 
/ B*(u)R(Lu)du + Cx = 

27T 7 

where A is a Lagrange multiplier. Solving the above simulta- 
neous equations yields 

B(lo)x + XE{uj) 



where 



D 



B*{uj)E(lu] 
2itA(lu) 

1 f B*(uj)B(oj) 



x = X(C - D)- 1 J 



du> 



2jt 



A(uj) 



-dio. 



Finally P(u>) can be solved from (f55|. Note that A is uniquely 
determined by the constraint d56l ). However, we could choose 
an arbitrary value for A (such as A = 1) since it merely scales 
the solution without altering the value of SNR. 

In the long target (IIR) limit, it is easy to see that the 
solutions converge to the following limits: 

. .. Xh 



g*g 



\(h*h + 0) 



for some /?, which have the required form in Theorem [2] for 
a = \j(J 2 w . Therefore, (|3T| l suggests that we must set 



2 2 

a„ = aa„, 



X. 



In the FIR case, however, the problem of choosing the correct 
value of a 2 is somewhat ambiguous because FIR solutions 
do not satisfy the hypotheses in Theorem [2] We nominally 
set a 2 = X in the FIR case as well. This is a good first 
approximation and fine-tuning this parameter may produce 
better results. 
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